단어 손실함수와 반복 페널티를 추가한 트랜스포머 인코더-디코더 제목 생성 모델

성수진; 차정원; Su-Jin Seong; Jeong-Won Cha

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	단어 손실함수와 반복 페널티를 추가한 트랜스포머 인코더-디코더 제목 생성 모델
영문제목(English Title)	Transformer Encoder-Decoder based Title Generation Model with Word Loss and Repetition Penalty
저자(Author)	성수진 차정원 Su-Jin Seong Jeong-Won Cha
원문수록처(Citation)	VOL 27 NO. 04 PP. 0210 ~ 0215 (2021. 04)
한글내용 (Korean Abstract)	제목은 문서를 대표하는 어구 혹은 문장이라 정의할 수 있다. 우리는 문서의 제목을 생성하기 위해 트랜스포머 기반 인코더-디코더 구조를 제안한다. 대용량 문서를 이용하여 트랜스포머 인코더-디코더 구조의 사전학습(pre-training)을 진행하고 본문과 제목 쌍으로 이루어진 문서를 이용하여 미세조정 (fine-tuning)을 진행하였다. 또한 제목 생성 태스크로 범위가 제한되는 미세조정 과정에서 입력 문서에 나타나는 어절의 생성 비율을 증가시키기 위해 단어 손실함수를 추가하고 토큰이 반복적으로 생성되는 문제를 개선하기 위한 반복 패널티를 모델 추가하는 방법을 제안한다. 25,564개의 논문 데이터를 사용한 실험에서 단어 손실함수와 반복 패널티를 개별적으로 적용시킨 모델의 성능이 기존 모델에 비해 개선되고, 두 제안 방법을 모두 적용한 모델에서는 Rouge-L의 성능이 2.7% 향상되는 것을 확인하였다.
영문내용 (English Abstract)	The title can be defined as a phrase or sentence the represents the document. We propose a transformer encoder-decoder model to generate the title of the document. The transformer model is pre-trained based on a the usage of a large document, and fine-tuning is performed using the data comprising of the body and title. Also, in the fine-tuning process, the scope of which is limited to the title generation task, a Word Loss is added to increase the generation ratio of words appearing in the input document and ground truth title. We propose a method of adding a Repeat Penalty to the model to reduce the problem that tokens are repeatedly generated. In an experiment conducted using data from 25,564 papers, the performance of the model that individually applied the Word Loss and the Repeat Penalty was improved compared to the baseline. It was confirmed that Rouge-L's performance improved by 2.7% in the model to which both the proposed methods were applied.
키워드(Keyword)	트랜스포머 인코더-디코더 모델 자동 제목 생성 자동 제목 생성 단어 손실함수 반복 페널티 transformer encoder-decoder automatic title generation word loss repeat penalty
파일첨부	PDF 다운로드